notebook.community

Edit and run



In [2]:

    
import graphlab



In [3]:

    
image_train = graphlab.SFrame('image_train_data/')









    



This non-commercial license of GraphLab Create for academic use is assigned to y_xwang@163.com and will expire on March 13, 2018.






    



[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1489783636.log



In [4]:

    
image_test = graphlab.SFrame('image_test_data/')



In [5]:

    
graphlab.canvas.set_target('ipynb')



In [6]:

    
image_train['image'].show()

train a classifier on the raw picture pixels



In [7]:

    
raw_pixel_model = graphlab.logistic_classifier.create(image_train, target='label', features=['image_array'])









    



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.







    




WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.






    




Logistic regression:






    




--------------------------------------------------------






    




Number of examples          : 1890






    




Number of classes           : 4






    




Number of feature columns   : 1






    




Number of unpacked features : 3072






    




Number of coefficients    : 9219






    




Starting L-BFGS






    




--------------------------------------------------------






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




| 1         | 6        | 0.000004  | 5.984170     | 0.259788          | 0.165217            |






    




| 2         | 9        | 5.000000  | 8.711652     | 0.404762          | 0.373913            |






    




| 3         | 10       | 5.000000  | 10.275751    | 0.441270          | 0.408696            |






    




| 4         | 11       | 5.000000  | 11.783001    | 0.297884          | 0.286957            |






    




| 5         | 13       | 1.000000  | 14.188616    | 0.402646          | 0.365217            |






    




| 6         | 14       | 1.000000  | 15.671265    | 0.235979          | 0.278261            |






    




| 7         | 16       | 1.000000  | 18.304505    | 0.454497          | 0.417391            |






    




| 8         | 17       | 1.000000  | 19.766691    | 0.457143          | 0.417391            |






    




| 9         | 18       | 1.000000  | 21.504897    | 0.461905          | 0.408696            |






    




| 10        | 19       | 1.000000  | 23.040814    | 0.471958          | 0.391304            |






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




TERMINATED: Iteration limit reached.






    




This model may not be optimal. To improve it, consider increasing `max_iterations`.

predict



In [8]:

    
image_test[0:3]['image'].show()



In [9]:

    
image_test[0:3]['label']









    Out[9]:





dtype: str
Rows: 3
['cat', 'automobile', 'cat']



In [10]:

    
raw_pixel_model.predict(image_test[0:3])









    Out[10]:





dtype: str
Rows: 3
['bird', 'cat', 'bird']

evaluate all model



In [11]:

    
raw_pixel_model.evaluate(image_test)









    Out[11]:





{'accuracy': 0.431, 'auc': 0.7134548333333341, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     bird     |       dog       |  213  |
 |     dog      |       cat       |   95  |
 |     bird     |    automobile   |   35  |
 |  automobile  |    automobile   |  371  |
 |     cat      |       dog       |  384  |
 |     dog      |       dog       |  488  |
 |     bird     |       bird      |  715  |
 |  automobile  |       bird      |  346  |
 |     bird     |       cat       |   37  |
 |  automobile  |       cat       |   71  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.4114369755764029, 'log_loss': 1.2646741643055923, 'precision': 0.48806633476245403, 'recall': 0.43099999999999994, 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 	class	int
 
 Rows: 400004
 
 Data:
 +-----------+-----+-----+------+------+-------+
 | threshold | fpr | tpr |  p   |  n   | class |
 +-----------+-----+-----+------+------+-------+
 |    0.0    | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   1e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   2e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   3e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   4e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   5e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   6e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   7e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   8e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   9e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 +-----------+-----+-----+------+------+-------+
 [400004 rows x 6 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

can we use deep features to improve the model



In [12]:

    
len(image_train)









    Out[12]:





2005



In [14]:

    
#deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
image_train.head()









    Out[14]:





    
        id
        image
        label
        deep_features
        image_array
    
    
        24
        Height: 32 Width: 32
        bird
        [0.242871761322,
1.09545373917, 0.0, ...
        [73.0, 77.0, 58.0, 71.0,
68.0, 50.0, 77.0, 69.0, ...
    
    
        33
        Height: 32 Width: 32
        cat
        [0.525087952614, 0.0,
0.0, 0.0, 0.0, 0.0, ...
        [7.0, 5.0, 8.0, 7.0, 5.0,
8.0, 5.0, 4.0, 6.0, 7.0, ...
    
    
        36
        Height: 32 Width: 32
        cat
        [0.566015958786, 0.0,
0.0, 0.0, 0.0, 0.0, ...
        [169.0, 122.0, 65.0,
131.0, 108.0, 75.0, ...
    
    
        70
        Height: 32 Width: 32
        dog
        [1.12979578972, 0.0, 0.0,
0.778194487095, 0.0, ...
        [154.0, 179.0, 152.0,
159.0, 183.0, 157.0, ...
    
    
        90
        Height: 32 Width: 32
        bird
        [1.71786928177, 0.0, 0.0,
0.0, 0.0, 0.0, ...
        [216.0, 195.0, 180.0,
201.0, 178.0, 160.0, ...
    
    
        97
        Height: 32 Width: 32
        automobile
        [1.57818555832, 0.0, 0.0,
0.0, 0.0, 0.0, ...
        [33.0, 44.0, 27.0, 29.0,
44.0, 31.0, 32.0, 45.0, ...
    
    
        107
        Height: 32 Width: 32
        dog
        [0.0, 0.0,
0.220677852631, 0.0,  ...
        [97.0, 51.0, 31.0, 104.0,
58.0, 38.0, 107.0, 61.0, ...
    
    
        121
        Height: 32 Width: 32
        bird
        [0.0, 0.23753464222, 0.0,
0.0, 0.0, 0.0, ...
        [93.0, 96.0, 88.0, 102.0,
106.0, 97.0, 117.0, ...
    
    
        136
        Height: 32 Width: 32
        automobile
        [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 7.5737862587, 0.0, ...
        [35.0, 59.0, 53.0, 36.0,
56.0, 56.0, 42.0, 62.0, ...
    
    
        138
        Height: 32 Width: 32
        bird
        [0.658935725689, 0.0,
0.0, 0.0, 0.0, 0.0, ...
        [205.0, 193.0, 195.0,
200.0, 187.0, 193.0, ...
    

[10 rows x 5 columns]

Given the deep features, let's train a classifier



In [15]:

    
deep_features_model = graphlab.logistic_classifier.create(image_train, target='label', features=['deep_features'])









    



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.







    




WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.






    




WARNING: Detected extremely low variance for feature(s) 'deep_features' because all entries are nearly the same.
Proceeding with model training using all features. If the model does not provide results of adequate quality, exclude the above mentioned feature(s) from the input dataset.






    




Logistic regression:






    




--------------------------------------------------------






    




Number of examples          : 1910






    




Number of classes           : 4






    




Number of feature columns   : 1






    




Number of unpacked features : 4096






    




Number of coefficients    : 12291






    




Starting L-BFGS






    




--------------------------------------------------------






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




| 1         | 5        | 0.000131  | 5.387331     | 0.765445          | 0.736842            |






    




| 2         | 9        | 0.250000  | 9.727010     | 0.770157          | 0.715789            |






    




| 3         | 10       | 0.250000  | 11.121171    | 0.773298          | 0.726316            |






    




| 4         | 11       | 0.250000  | 12.512048    | 0.780105          | 0.736842            |






    




| 5         | 12       | 0.250000  | 13.866592    | 0.790576          | 0.736842            |






    




| 6         | 13       | 0.250000  | 15.268560    | 0.804188          | 0.757895            |






    




| 7         | 14       | 0.250000  | 16.752407    | 0.826702          | 0.747368            |






    




| 8         | 15       | 0.250000  | 18.138179    | 0.840314          | 0.747368            |






    




| 9         | 16       | 0.250000  | 19.552293    | 0.858639          | 0.747368            |






    




| 10        | 17       | 0.250000  | 20.984161    | 0.879581          | 0.747368            |






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




TERMINATED: Iteration limit reached.






    




This model may not be optimal. To improve it, consider increasing `max_iterations`.



In [16]:

    
deep_features_model.predict(image_test[0:3])









    Out[16]:





dtype: str
Rows: 3
['cat', 'automobile', 'cat']



In [17]:

    
deep_features_model.evaluate(image_test)









    Out[17]:





{'accuracy': 0.7915, 'auc': 0.9418345833333305, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |  automobile  |       cat       |   13  |
 |     bird     |       dog       |   66  |
 |     cat      |       bird      |   63  |
 |  automobile  |       dog       |   7   |
 |     cat      |    automobile   |   33  |
 |     dog      |       bird      |   35  |
 |     bird     |       cat       |  117  |
 |     dog      |    automobile   |   18  |
 |     dog      |       dog       |  740  |
 |     cat      |       dog       |  229  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.7917247987795695, 'log_loss': 0.5598031974693009, 'precision': 0.7932749787417366, 'recall': 0.7915, 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 	class	int
 
 Rows: 400004
 
 Data:
 +-----------+----------------+-----+------+------+-------+
 | threshold |      fpr       | tpr |  p   |  n   | class |
 +-----------+----------------+-----+------+------+-------+
 |    0.0    |      1.0       | 1.0 | 1000 | 3000 |   0   |
 |   1e-05   | 0.974333333333 | 1.0 | 1000 | 3000 |   0   |
 |   2e-05   | 0.964333333333 | 1.0 | 1000 | 3000 |   0   |
 |   3e-05   | 0.956333333333 | 1.0 | 1000 | 3000 |   0   |
 |   4e-05   | 0.948333333333 | 1.0 | 1000 | 3000 |   0   |
 |   5e-05   |     0.945      | 1.0 | 1000 | 3000 |   0   |
 |   6e-05   | 0.939666666667 | 1.0 | 1000 | 3000 |   0   |
 |   7e-05   |     0.934      | 1.0 | 1000 | 3000 |   0   |
 |   8e-05   | 0.931666666667 | 1.0 | 1000 | 3000 |   0   |
 |   9e-05   | 0.929333333333 | 1.0 | 1000 | 3000 |   0   |
 +-----------+----------------+-----+------+------+-------+
 [400004 rows x 6 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}



In [ ]:

id	image	label	deep_features	image_array
24	Height: 32 Width: 32	bird	[0.242871761322, 1.09545373917, 0.0, ...	[73.0, 77.0, 58.0, 71.0, 68.0, 50.0, 77.0, 69.0, ...
33	Height: 32 Width: 32	cat	[0.525087952614, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[7.0, 5.0, 8.0, 7.0, 5.0, 8.0, 5.0, 4.0, 6.0, 7.0, ...
36	Height: 32 Width: 32	cat	[0.566015958786, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[169.0, 122.0, 65.0, 131.0, 108.0, 75.0, ...
70	Height: 32 Width: 32	dog	[1.12979578972, 0.0, 0.0, 0.778194487095, 0.0, ...	[154.0, 179.0, 152.0, 159.0, 183.0, 157.0, ...
90	Height: 32 Width: 32	bird	[1.71786928177, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[216.0, 195.0, 180.0, 201.0, 178.0, 160.0, ...
97	Height: 32 Width: 32	automobile	[1.57818555832, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[33.0, 44.0, 27.0, 29.0, 44.0, 31.0, 32.0, 45.0, ...
107	Height: 32 Width: 32	dog	[0.0, 0.0, 0.220677852631, 0.0, ...	[97.0, 51.0, 31.0, 104.0, 58.0, 38.0, 107.0, 61.0, ...
121	Height: 32 Width: 32	bird	[0.0, 0.23753464222, 0.0, 0.0, 0.0, 0.0, ...	[93.0, 96.0, 88.0, 102.0, 106.0, 97.0, 117.0, ...
136	Height: 32 Width: 32	automobile	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.5737862587, 0.0, ...	[35.0, 59.0, 53.0, 36.0, 56.0, 56.0, 42.0, 62.0, ...
138	Height: 32 Width: 32	bird	[0.658935725689, 0.0, 0.0, 0.0, 0.0, 0.0, ...	[205.0, 193.0, 195.0, 200.0, 187.0, 193.0, ...